Overview

Dataset statistics

Number of variables13
Number of observations380
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory38.7 KiB
Average record size in memory104.3 B

Variable types

Numeric10
Categorical3

Alerts

HS is highly correlated with HSTHigh correlation
AS is highly correlated with ASTHigh correlation
HST is highly correlated with HSHigh correlation
AST is highly correlated with ASHigh correlation
HS is highly correlated with HSTHigh correlation
AS is highly correlated with ASTHigh correlation
HST is highly correlated with HSHigh correlation
AST is highly correlated with ASHigh correlation
HS is highly correlated with HSTHigh correlation
AS is highly correlated with ASTHigh correlation
HST is highly correlated with HSHigh correlation
AST is highly correlated with ASHigh correlation
HS is highly correlated with HSTHigh correlation
AS is highly correlated with AST and 1 other fieldsHigh correlation
HST is highly correlated with HSHigh correlation
AST is highly correlated with ASHigh correlation
HF is highly correlated with HYHigh correlation
AC is highly correlated with ASHigh correlation
HY is highly correlated with HFHigh correlation
AST has 4 (1.1%) zeros Zeros
HC has 6 (1.6%) zeros Zeros
AC has 6 (1.6%) zeros Zeros
HY has 94 (24.7%) zeros Zeros
AY has 64 (16.8%) zeros Zeros

Reproduction

Analysis started2022-04-05 05:13:36.430439
Analysis finished2022-04-05 05:13:57.596067
Duration21.17 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

HS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct28
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.80263158
Minimum1
Maximum32
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum1
5-th percentile6
Q111
median13.5
Q317
95-th percentile23
Maximum32
Range31
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.77244658
Coefficient of variation (CV)0.3457635272
Kurtosis0.3982875386
Mean13.80263158
Median Absolute Deviation (MAD)3.5
Skewness0.347270232
Sum5245
Variance22.77624635
MonotonicityNot monotonic
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
1339
 
10.3%
1436
 
9.5%
1134
 
8.9%
1729
 
7.6%
1227
 
7.1%
1525
 
6.6%
1625
 
6.6%
1024
 
6.3%
920
 
5.3%
1918
 
4.7%
Other values (18)103
27.1%
ValueCountFrequency (%)
11
 
0.3%
21
 
0.3%
32
 
0.5%
43
 
0.8%
53
 
0.8%
612
3.2%
711
2.9%
813
3.4%
920
5.3%
1024
6.3%
ValueCountFrequency (%)
321
 
0.3%
272
 
0.5%
263
 
0.8%
251
 
0.3%
246
 
1.6%
238
2.1%
223
 
0.8%
216
 
1.6%
2011
2.9%
1918
4.7%

AS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct23
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.06315789
Minimum2
Maximum25
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum2
5-th percentile4
Q18
median11
Q313
95-th percentile18
Maximum25
Range23
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.215339562
Coefficient of variation (CV)0.3810249842
Kurtosis0.3009830451
Mean11.06315789
Median Absolute Deviation (MAD)3
Skewness0.3844329432
Sum4204
Variance17.76908763
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
1040
10.5%
1140
10.5%
1338
10.0%
1236
9.5%
932
 
8.4%
731
 
8.2%
1421
 
5.5%
818
 
4.7%
1618
 
4.7%
617
 
4.5%
Other values (13)89
23.4%
ValueCountFrequency (%)
23
 
0.8%
37
 
1.8%
411
 
2.9%
514
 
3.7%
617
4.5%
731
8.2%
818
4.7%
932
8.4%
1040
10.5%
1140
10.5%
ValueCountFrequency (%)
251
 
0.3%
243
 
0.8%
232
 
0.5%
212
 
0.5%
202
 
0.5%
197
 
1.8%
187
 
1.8%
1715
3.9%
1618
4.7%
1515
3.9%

HST
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct19
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.657894737
Minimum0
Maximum18
Zeros1
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median7
Q310
95-th percentile14
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.428703269
Coefficient of variation (CV)0.4477344476
Kurtosis-0.3033393372
Mean7.657894737
Median Absolute Deviation (MAD)2
Skewness0.3379770372
Sum2910
Variance11.75600611
MonotonicityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
752
13.7%
842
11.1%
538
10.0%
635
9.2%
931
8.2%
1030
7.9%
429
7.6%
1124
6.3%
1223
6.1%
323
6.1%
Other values (9)53
13.9%
ValueCountFrequency (%)
01
 
0.3%
15
 
1.3%
214
 
3.7%
323
6.1%
429
7.6%
538
10.0%
635
9.2%
752
13.7%
842
11.1%
931
8.2%
ValueCountFrequency (%)
181
 
0.3%
171
 
0.3%
164
 
1.1%
157
 
1.8%
149
 
2.4%
1311
 
2.9%
1223
6.1%
1124
6.3%
1030
7.9%
931
8.2%

AST
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.955263158
Minimum0
Maximum20
Zeros4
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q14
median6
Q38
95-th percentile11
Maximum20
Range20
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.956250689
Coefficient of variation (CV)0.4964097489
Kurtosis0.6620360997
Mean5.955263158
Median Absolute Deviation (MAD)2
Skewness0.4886485985
Sum2263
Variance8.739418136
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
751
13.4%
448
12.6%
548
12.6%
644
11.6%
332
8.4%
932
8.4%
231
8.2%
830
7.9%
1023
6.1%
114
 
3.7%
Other values (6)27
7.1%
ValueCountFrequency (%)
04
 
1.1%
114
 
3.7%
231
8.2%
332
8.4%
448
12.6%
548
12.6%
644
11.6%
751
13.4%
830
7.9%
932
8.4%
ValueCountFrequency (%)
201
 
0.3%
143
 
0.8%
133
 
0.8%
123
 
0.8%
1113
 
3.4%
1023
6.1%
932
8.4%
830
7.9%
751
13.4%
644
11.6%

HF
Real number (ℝ≥0)

HIGH CORRELATION

Distinct19
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.98684211
Minimum3
Maximum21
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum3
5-th percentile6
Q18
median11
Q313
95-th percentile17
Maximum21
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.450721392
Coefficient of variation (CV)0.3140776357
Kurtosis-0.5250702246
Mean10.98684211
Median Absolute Deviation (MAD)3
Skewness0.2594516024
Sum4175
Variance11.90747813
MonotonicityNot monotonic
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1139
10.3%
1238
10.0%
1038
10.0%
1337
9.7%
936
9.5%
1535
9.2%
834
8.9%
733
8.7%
624
6.3%
1416
 
4.2%
Other values (9)50
13.2%
ValueCountFrequency (%)
31
 
0.3%
42
 
0.5%
59
 
2.4%
624
6.3%
733
8.7%
834
8.9%
936
9.5%
1038
10.0%
1139
10.3%
1238
10.0%
ValueCountFrequency (%)
211
 
0.3%
201
 
0.3%
196
 
1.6%
184
 
1.1%
1710
 
2.6%
1616
4.2%
1535
9.2%
1416
4.2%
1337
9.7%
1238
10.0%

AF
Real number (ℝ≥0)

Distinct21
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.42894737
Minimum2
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum2
5-th percentile6
Q19
median11
Q314
95-th percentile18
Maximum24
Range22
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.481228053
Coefficient of variation (CV)0.304597435
Kurtosis0.2190760594
Mean11.42894737
Median Absolute Deviation (MAD)2
Skewness0.219575488
Sum4343
Variance12.11894876
MonotonicityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1050
13.2%
1145
11.8%
1340
10.5%
1238
10.0%
936
9.5%
1435
9.2%
830
7.9%
1517
 
4.5%
716
 
4.2%
1615
 
3.9%
Other values (11)58
15.3%
ValueCountFrequency (%)
21
 
0.3%
33
 
0.8%
44
 
1.1%
59
 
2.4%
69
 
2.4%
716
 
4.2%
830
7.9%
936
9.5%
1050
13.2%
1145
11.8%
ValueCountFrequency (%)
241
 
0.3%
211
 
0.3%
202
 
0.5%
197
 
1.8%
1810
 
2.6%
1711
 
2.9%
1615
 
3.9%
1517
4.5%
1435
9.2%
1340
10.5%

HC
Real number (ℝ≥0)

ZEROS

Distinct18
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.044736842
Minimum0
Maximum17
Zeros6
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q14
median5
Q38
95-th percentile12
Maximum17
Range17
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.107671788
Coefficient of variation (CV)0.5141120067
Kurtosis0.5393424423
Mean6.044736842
Median Absolute Deviation (MAD)2
Skewness0.7041762858
Sum2297
Variance9.657623941
MonotonicityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
456
14.7%
556
14.7%
345
11.8%
840
10.5%
638
10.0%
736
9.5%
1022
 
5.8%
922
 
5.8%
219
 
5.0%
1111
 
2.9%
Other values (8)35
9.2%
ValueCountFrequency (%)
06
 
1.6%
19
 
2.4%
219
 
5.0%
345
11.8%
456
14.7%
556
14.7%
638
10.0%
736
9.5%
840
10.5%
922
 
5.8%
ValueCountFrequency (%)
171
 
0.3%
163
 
0.8%
152
 
0.5%
143
 
0.8%
133
 
0.8%
128
 
2.1%
1111
 
2.9%
1022
5.8%
922
5.8%
840
10.5%

AC
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct16
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.997368421
Minimum0
Maximum16
Zeros6
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q37
95-th percentile11
Maximum16
Range16
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.834482995
Coefficient of variation (CV)0.5671951227
Kurtosis1.041808135
Mean4.997368421
Median Absolute Deviation (MAD)2
Skewness0.9055941766
Sum1899
Variance8.034293848
MonotonicityNot monotonic
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
463
16.6%
356
14.7%
550
13.2%
644
11.6%
241
10.8%
739
10.3%
122
 
5.8%
819
 
5.0%
1111
 
2.9%
910
 
2.6%
Other values (6)25
 
6.6%
ValueCountFrequency (%)
06
 
1.6%
122
 
5.8%
241
10.8%
356
14.7%
463
16.6%
550
13.2%
644
11.6%
739
10.3%
819
 
5.0%
910
 
2.6%
ValueCountFrequency (%)
162
 
0.5%
141
 
0.3%
133
 
0.8%
125
 
1.3%
1111
 
2.9%
108
 
2.1%
910
 
2.6%
819
5.0%
739
10.3%
644
11.6%

HY
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.413157895
Minimum0
Maximum7
Zeros94
Zeros (%)24.7%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile3
Maximum7
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.16017264
Coefficient of variation (CV)0.8209787772
Kurtosis0.9185688125
Mean1.413157895
Median Absolute Deviation (MAD)1
Skewness0.7573984279
Sum537
Variance1.346000555
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1120
31.6%
2102
26.8%
094
24.7%
349
12.9%
411
 
2.9%
53
 
0.8%
71
 
0.3%
ValueCountFrequency (%)
094
24.7%
1120
31.6%
2102
26.8%
349
12.9%
411
 
2.9%
53
 
0.8%
71
 
0.3%
ValueCountFrequency (%)
71
 
0.3%
53
 
0.8%
411
 
2.9%
349
12.9%
2102
26.8%
1120
31.6%
094
24.7%

AY
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.839473684
Minimum0
Maximum7
Zeros64
Zeros (%)16.8%
Negative0
Negative (%)0.0%
Memory size3.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum7
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.30475879
Coefficient of variation (CV)0.7093109303
Kurtosis0.2904893543
Mean1.839473684
Median Absolute Deviation (MAD)1
Skewness0.543688834
Sum699
Variance1.702395501
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
2113
29.7%
194
24.7%
372
18.9%
064
16.8%
426
 
6.8%
58
 
2.1%
62
 
0.5%
71
 
0.3%
ValueCountFrequency (%)
064
16.8%
194
24.7%
2113
29.7%
372
18.9%
426
 
6.8%
58
 
2.1%
62
 
0.5%
71
 
0.3%
ValueCountFrequency (%)
71
 
0.3%
62
 
0.5%
58
 
2.1%
426
 
6.8%
372
18.9%
2113
29.7%
194
24.7%
064
16.8%

HR
Categorical

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
0
351 
1
 
29

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters380
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

Most occurring characters

ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number380
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
Common380
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0351
92.4%
129
 
7.6%

AR
Categorical

Distinct3
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
0
348 
1
 
30
2
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters380
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number380
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common380
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0348
91.6%
130
 
7.9%
22
 
0.5%

FTR
Categorical

Distinct3
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
H
179 
D
111 
A
90 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters380
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowH
2nd rowH
3rd rowD
4th rowH
5th rowD

Common Values

ValueCountFrequency (%)
H179
47.1%
D111
29.2%
A90
23.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
h179
47.1%
d111
29.2%
a90
23.7%

Most occurring characters

ValueCountFrequency (%)
H179
47.1%
D111
29.2%
A90
23.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter380
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H179
47.1%
D111
29.2%
A90
23.7%

Most occurring scripts

ValueCountFrequency (%)
Latin380
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
H179
47.1%
D111
29.2%
A90
23.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII380
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H179
47.1%
D111
29.2%
A90
23.7%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

HSASHSTASTHFAFHCACHYAYHRARFTR
0231211215151671200H
17172121914132100H
21312971213481300D
318101341010311000H
4613271310363310D
5221118713161030200D
611967811641100A
71310761713550200H
87144713159111311D
918710395532200H

Last rows

HSASHSTASTHFAFHCACHYAYHRARFTR
370109841212452200H
37116168101110651110A
3721119391214653210H
373101266125532110D
374181411785882100H
375151310758760000D
376111159109531400A
377227163515750300H
3781717121278460200A
3791213510129833100A